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A significant problem with most functional data analyses is that 
of misaligned curves. Without adjustment, even an analysis as simple 
as estimation of the mean will fail. One common method to synchro- 
nize a set of curves involves equating "landmarks" such as peaks or 
troughs. The landmarks method can work well but will fail if marker 
events can not be identified or are missing from some curves. An al- 
ternative approach, the "continuous monotone registration" method, 
works by transforming the curves so that they are as close as pos- 
sible to a target function. This method can also perform well but 
is highly dependent on identifying an accurate target function. We 
develop an alignment method based on equating the "moments" of 
a given set of curves. These moments are intended to capture the 
locations of important features which may represent local behavior, 
such as maximums and minimums, or more global characteristics, 
such as the slope of the curve averaged over time. Our method works 
by equating the moments of the curves while also shrinking toward 
a common shape. This allows us to capture the advantages of both 
the landmark and continuous monotone registration approaches. The 
method is illustrated on several data sets and a simulation study is 
performed. 

1. Introduction. The fundamental paradigm of functional data analysis 
(FDA) involves treating the entire curve or function as the unit of observation 
rather than individual measurements from the curve [Ramsay and Silverman 
(2005)]. As FDA has become more common, many statistical analysis tech- 
niques have been adapted to the paradigm. The analysis of functional data 
possess a number of problems not generally encountered with more classical 
data. One of the most important is that of misaligned curves. Figure 1 pro- 
vides a real world illustration of this difficulty using the acceleration curves 
of ten boys from the Berkeley growth curve study [Tuddenham and Snyder 
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(1954)] where the heights of individuals were recorded at regular intervals 
until age 18. Figure 1(a), which plots smoothed versions of the observed ac- 
celeration curves, shows a clear trend of positive and then negative accelera- 
tion during the teenage years. However, the onset times, and spread, of these 
growth spurts can differ by several years so the curves can be considered to 
be misaligned or "unsynchronized." The dashed line, which represents the 
cross-sectional mean based on the observed curves, clearly fails to capture 
the height of the peaks and troughs and underestimates the rate of change 
in the acceleration curve during the puberty years. Figure 1(b) plots the 
corresponding curves after synchronization using the approach developed in 
this paper. Now one can much more clearly discern the general shape of the 
curves and the gray line, which represents the mean from the synchronized 
curves, shows that the true peaks and troughs are considerably more ex- 
treme than was previously apparent. Computing the mean of a set of curves 
is only one example of the many applications for which proper alignment of 
the curves is an essential component. For example, functional principal com- 
ponents analysis [James, Hastie and Sugar (2000) and Rice and Wu (2001)], 
regression with both functional responses [Zeger and Diggle (1994)] and 
functional predictors [Ferraty and Vieu (2002) and James and Silverman 
(2005)], functional linear discriminant analysis [James and Hastie (2001) 
and Ferraty and Vieu (2003)] and functional clustering [James and Sugar 
(2003) and Bar-Joseph et al. (2003)] all assume that the starting curves are 
correctly aligned on the x-axis. 

The problem of realigning such curves has been studied under different 
names in several fields. In the statistics literature it is referred to as curve 
registration [Silverman (1995) and Ramsay and Li (1998)] or, in the con- 
text of computing an average curve, structural averaging [Kneip and Gasser 
(1992)]. It is also called curve alignment in biology and time warping in en- 
gineering [Sakoe and Chiba (1978)]. Any set of curves can be decomposed 
into "amplitude" functions, which measure differences in the y-axis, and 
"warping" functions, which measure differences in location on the x-axis. 
Synchronization requires estimation of the warping functions. 

A number of approaches have been proposed for this problem. Marker, 
or landmark, registration [Kneip and Gasser (1992)] involves selecting com- 
mon features in the data, such as peaks or troughs, and transforming time 
so that these features occur together. This method can work well when such 
features can be easily identified, but tends to perform poorly if no obvious 
and consistent landmarks exist. In addition, the landmarks often need to 
be manually identified, preventing the implementation of a fully automatic 
approach. An alternative method involves aligning curves using a target 
function. Silverman (1995) proposed registering curves using a simple shift 
in time such that the average squared distance between each curve and 
a target function is minimized. This idea was extended in Ramsay and Li 
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Fig. 1. Ten acceleration curves from the Berkeley growth curve study, (a) unsynchro- 
nized curves and (b) after alignment. The dashed lines represent the cross-sectional mean 
based on the observed curves while the gray solid line corresponds to the mean from the 
synchronized data. 



(1998) using a Procrustes type fitting procedure on a general nonlinear class 
of time transformations to provide maximal alignment to the target func- 
tion subject to suitable smoothness of the transformations. This approach, 
called "continuous monotone registration," is often very effective but de- 
pends heavily on the target function. Generally the cross-sectional mean is 
used, which can provide misleading results if the curves are significantly mis- 
aligned. Other recent work in this area includes Kneip et al. (2000), R0nn 
(2001) and Gervini and Gasser (2005). 

The aim of this paper is to develop an automatic synchronization method 
that incorporates the best properties of both the landmark and continuous 
monotone registration approaches. We start by calculating "moments" for 
each curve. These moments are intended to capture the locations of impor- 
tant features which may represent local behavior, such as maximums and 
minimums, or more global characteristics, such as the slope of the curve 
over time. We then synchronize the curves by both, equating the moments 
for each curve, which has the effect of aligning common features, and simul- 
taneously shrinking toward a common shape. In situations where obvious 
marker events are present, our approach has the same advantages as land- 
mark registration. Additionally, when there are no events but an accurate 
target function can be estimated, we will achieve similar performance to the 
continuous monotone registration method. However, we show through the 
use of theory, simulations and real world examples that even in situations 
where the landmark and continuous monotone registration procedures fail, 
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that is, where obvious markers do not exist and an accurate target function 
can not be computed, our moments based method can still perform well. 

General definitions for the moments of an arbitrary function are devel- 
oped in Section 2. These moments are defined in terms of "feature" functions 
which can be designed to detect both local and global characteristics of the 
curves. In Section 3 we provide a model for the observed or unsynchronized 
curves. The moments from Section 2 are included as a fundamental part 
of the model. We also discuss alternative types of warping functions, both 
linear and nonlinear. A synchronization procedure to fit our model is pre- 
sented in Section 4. Our procedure attempts to (a) equate the moments 
among the curves, and hence, align the common "features" in analogy with 
landmark registration, and (b) shrink the curves toward a common shape in 
analogy to the continuous monotone registration approach. We provide an 
algorithm for implementing our method and demonstrate that the estimates 
are consistent. Our method is illustrated on the Berkeley growth curve data 
in Section 5. The results from several simulation studies, comparing the per- 
formance of our approach with other synchronization methods, are reported 
in Section 6. Finally, Section 7 discusses the relationship of our approach to 
other common methods and suggests some further extensions. 

2. Defining the moments of a function. In this section we develop def- 
initions for the moments of an arbitrary function, g, in analogy with the 
moments of a random variable. The fundamental idea is that, just as one 
can define the distribution of a random variable through its moments and 
equate two different distributions by transforming to equate the moments, 
we can also define the shape of a function through its moments and synchro- 
nize two curves by equating their moments. We first introduce the concept 
of a "feature function," I g (t), for g and impose the constraints 



which ensure that I g is a weighting function. There are various possible 
choices for I g (t). Depending on the properties of our data, we may wish to 
utilize a function that places high weight on the time points corresponding 
to local features, such as maximums or minimums, or alternatively use a 
function that places weight according to more global characteristics such as 
the slope at a given time. 

First, we discuss local approaches where most of the weight is concen- 
trated around the time points corresponding to a specific feature in the 
data. For example, as r — > oo, /™ ax (i) oc (g(t) — mm{g(t)}) r and /™ m (i) <x 
(max{g(t)} — g(t)) r will respectively concentrate their weight on the global 
maximum and minimum of g(t). We may wish to search for local, as well as 
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global, maximums and minimums. In this case one could utilize 



0, 9 {2) (t) = 0. 

This function places maximum weight on points where the first derivative 
is zero. However, I^ ocal (t) is also high for points with a low first derivative 
but a high second derivative. Thus, the function effectively searches for lo- 
cal maximums or minimums where g is changing most rapidly. As r — > oo, 
jiocai^ w -jj pj ace a \\ its weight on the regions around local turning points. 

Finally, we examine Ig m \t), which places weights according to the absolute 

mth derivative of the curve, g^ m > , that is, 4 m) (£) oc \g( m > (t) | . With m = 0, 
this function puts highest weight on large absolute values of g. With m = 1, 
most weight is placed on time points where g has a large slope and would 
be used when we are most interested in regions where a curve is chang- 
ing rapidly. Setting m = 2 searches for points with greatest curvature etc. 
Ig™ 1 ^ (t) can be considered to be searching for global characteristics of a curve 
because it is likely to spread its mass over all time points. 

Then, for a given choice of I g , we define the first moment of g by 

fJtW = j tlg(t)dt 

and the kth central moment by 



u {k) 



J(t-^) k I g (t)dt, k>2. 



/ig 1 ^ provides a measure of the center of g on the time axis, while fi g 2 ^ mea- 
sures variability in g. Note that the variability is measured in relation to the 
time axis and not the y, or amplitude, axis. A curve could vary significantly 
in the y-axis, but still have a low value for fjg . In general, p!p will be more 
useful than the higher order moments when using feature functions such as 
jmax or jmm faat concentrate on local features. The higher-order moments, 

that is, fi g k ^ for k>2, increase in importance when using more global feature 

functions such as Ig . 

To better understand the properties of yS k \ we examine the relationship 
between the moments of a function h(s) and those of the shape invariant 
function h(^^). In this formulation, h(s) is stretched, about s = 0, by a 

factor b and shifted to the right by a. Hence, since fi^ is a measure of the 
center of a function and is a measure of variability about the center, 
stretching by a factor b should multiply the first moment by b and the 
fcth moment by b k . For example, one would expect that l^^ s _ a y^y which 
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measures the variability of the transformed function, would equal b ' fJ>W s y 
Similarly, a shift of a should add a to the first moment and leave the higher- 
order moments, which are centered around the first moment, unchanged. 
We express this mathematically as 

(1) Vh((s-a)/b)= b Vh(s)+ a and Vhls-a)/b)= hk Vhl), k>2. 

Theorem 1 shows that, provided we utilize a certain family of feature func- 
tions, these properties will hold. 

Theorem 1. Suppose that I g (t) is chosen such that 

(2) Ig((s-a)/b)(t) -00<f<00, 

for all a,g and b > 0. Then (1) will hold for any function h{s). 

Condition (2) holds for many large classes of feature functions. In partic- 
ular, the previously mentioned feature functions all satisfy (2) and, hence, 
their corresponding moments all possess the desirable properties given by 

(!)• 

Corollary 1. When utilizing Ig, 7™ ax , /™ m or I g ocal , condition (2) is 
satisfied, and hence, (1) holds. In addition, (2) is satisfied for any Ig(t) oc 
4>(g(t)) where <fr(t) is an arbitrary function. 

The feature functions we have utilized represent only a few of the possible 
choices one could utilize. In fact, one of the strengths of our approach is the 
ability to design functions which best suit one's particular data. 

3. The synchronization model. Let Yx(t), Yi {t),... , ljv(i) represent the 
unsynchronized functions or curves with Yi observed at ti,...,t n where 
tj € [0, T]. Suppose we select L feature functions, Ig,...,Ig, and associated 

moments, fig , ^g 2 ' k \ ■ ■ ■ , Z^'^- Then our synchronization model is given 
by 

(3) Yitj) = ZiiWiitj)) + Eij, i = l,...,N, 



(4) 



(Z,fc) _ (l,k) _ _ (l,k) _ (Z,fc) 
Mzj — ^Z 2 — ■ ■ ■ — ^z N — ' 

I = 1, . . . , L and k = 1, . . . , Ki, 

where /^'^ = jfY^i^y an d Zi(t) represents an "amplitude function," 
which is stretched on the time axis according to a strictly increasing "warp- 
ing function," Wi{t). In addition, represents i.i.d. random measurement 
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errors with Eeij = and Var(ey) = a 2 < oo. Note that we have assumed that 
the curves are all observed at a common set of points simply for notational 
convenience. There is nothing in our approach that will prevent it working 
on curves observed at differing time points. 

As with all curve registration methods, (3) has an identifiability problem 
between Z{ and W{. Landmark registration achieves identifiable results by 
assuming certain markers align for every curve. We generalize this approach 
using the moments condition given by (4) which forces the Z^s to have a 
common "shape." For example, if /™ ax (t), which searches for global maxi- 
mums, is chosen as the feature function, then (4) states that the Z^s have a 
common shape in as much as their global peaks occur at the same time point 
and that point is equal to the average of the peaks in the observed curves, 
Yi. As more feature functions are chosen, (4) forces more alignment in the 
Zi's. Landmark registration can be seen as a special case of (4) because 

H z can be used to identify specific marker events in each curve, such as 
peaks or troughs, in which case (4) simply forces an alignment of landmarks. 

However, fi z can also be used to measure more general and more global 

curve characteristics such as the mth derivative as discussed in Section 2. 

(I k) 

Note that by equating the moments for each curve to fj,^.' we are assuming 
that positive and negative warping cancels out, in terms of the moments, 
when averaged over all curves. Without this assumption, Zi and W% will not 
be identifiable. 

We model Z{ and W{ using finite-dimensional basis functions. The ampli- 
tude function is modeled as Zi(t) = z(t) T Oi, where z(t) is a p-dimensional 
basis function and 6i represents the corresponding basis coefficients. In the 
case of the warping functions, since they are restricted to be increasing, we 
can, without loss of generality, reparameterize them using 

(5) Wi(t) = 7*0 + / exp(/<(s)) ds, 



where 7^0 and fi are unconstrained. As with the amplitude functions, we 
model / using a finite-dimensional basis, fi(s) = w(s) T 7,, where w(s) is 
a ^-dimensional basis and 7, the corresponding coefficients. Several special 
cases of (5) can be achieved by appropriately restricting the 7 i coefficients. 
We shall explore two in this paper. The first is the linear warping function 
Wi(t) = on + Pit which is achieved by setting /j equal to a constant. The 
second is 

(6) Wm = TJ T ° eMMs))dS . 

Jo exp(fi(s))ds 

Equation (6) has the often desirable property that Wj(0) = and Wi(T) = T, 
which means that time is taken to run over a consistent time period for all 
curves. We utilize b-spline bases for both z and w but, in principle, any 
finite-dimensional basis will suffice. 
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4. Curve alignment. In this section we detail our curve alignment ap- 
proach for fitting the model from Section 3. 

4.1. A moments based alignment approach. The aim in fitting our model 
is to produce estimated curves, Yij = z(Wi(tj)) T 6i, that accurately approx- 
imate the observed curves, = Yi(tj), subject to two constraints. First, 
the shape of the synchronized curves, Zi(t), should be as close as possi- 
ble to that of the original curves. Notice that if W((t) = 1 for all values 
of t, then Zi(t) will have an identical shape to Yi(t). Therefore, we mea- 
sure the change in shape by examining the departure of W((t) from 1 using 
P(Wi) = (/{[^'(t)]" 1 - l}dt) 2 and, hence, choose a fit such that P(Wi) is 
small. We penalize the inverse of W[{t) to ensure slopes close to zero, which 
would imply an extremely high level of warping, are strongly discouraged. 
Second, the shapes of the Zj(t)'s should be as similar as possible to each 
other. Differences in the shapes can be measured either by examining vari- 
ability in the <Vs from a target fig, P{Oi) = — /^e|| 2 , or by concentrating 

on the spread of the moments, P(/xz%) = YliY^k^ii^Z ~ ) 2 - Hence, we 
find the 0j's, "j^'s and the \Iq that minimize 

1 N 

8=1 

where A sync , A mom and Aw are tuning parameters that determine the im- 
pact of each term on the fit. A mom and A sync control the balance between 
the continuous monotone registration and landmark registration methods. 
Conceptually, setting A mom = and minimizing Q is very similar to the 
continuous monotone registration method of Ramsay and Li (1998). Alter- 
natively, setting A sync = and minimizing Q provides a type of generalized 
landmark registration. Note that including ||Y, — Yj|| 2 and P(fiZi) ensures 
that both (3) and (4) from our synchronization model will hold. 

For fixed minimizing (7) is relatively simple because we only need 
minimize Q individually over 'y i and 0{ . This suggests the following iterative 
algorithm: 

1. For fixed minimize Q over 'y i and 9i for i = 1. Repeat for i = 2, . . . , N. 

2. Set /* = -^£^0;. 

3. Repeat 1 and 2 until convergence. 

Step 1 involves a nonlinear optimization, but can be achieved with relative 
ease because we only need optimize over each curve individually and the 
derivatives of Q can be computed analytically. Note that we optimize over 
the /i^ fc ^ as part of step 1, that is, we do not fix [4% at the previous value 
offli. 1 
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Fig. 2. (a) A simulated set often curves that have been "warped" in time with the grey 
line indicating the original shape, (b) The estimates for Zi(t) with Async = Amom = Aw = 0. 
(c) Estimates for Zi(t) with A mom =0. (d) Estimates including all four terms. 



Figure 2 uses a simulated data set to illustrate the need for all four terms 
in (7). Figure 2(a) plots ten curves, each generated from the solid grey 
curve in the center and then "warped" by distorting the time axis by dif- 
fering amounts. Figure 2(b) illustrates the corresponding ten estimates for 
the Zi's, representing the "synchronized" curves, obtained by minimizing 
(7) with A sync = A mom = Aw = 0. The fit is very good, with the estimated 
standard deviation of the £ij's only 0.006, but this approach has clearly done 
a poor job of synchronizing the data. Alternatively, Figure 2(c) shows the 
results using A sync = 10, a small value for Aw and A mom = 0. A high level 
of synchronization has resulted from the use of P(0i), but the curves bear 
little relationship to the original ones. In addition, the Z^s have been shrunk 
toward zero, resulting in a ten fold increase in the standard deviation of the 
estimates. As A sync is reduced and Aw increased, the fit shifts toward that 
shown in Figure 2(b), but at no stage do we get strong synchronization, 
the correct shape and a good fit to the data. Finally, Figure 2(d) provides 
a plot of the ten estimated Z^s after setting A mom > and using two mo- 
ments corresponding to / max and / min . Notice that the addition of P(fiZi) 
has enabled us to not only synchronize the data but to also reproduce the 
original shape of the curves. In addition, the estimated standard deviation 
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is almost identical to that from the fit illustrated in Figure 2(b), indicating 
that the synchronization has not been at the expense of an accurate fit to 
the data. 

There are two reasons for the inadequate fit in Figure 2(c). First, because 
of the significant distortion of the observed Y^'s, the cross-sectional mean, 
which is used to compute is a poor estimate for the true shape, so the 
curves have been synchronized toward the wrong "target." This is the same 
problem that one would encounter when using the continuous monotone 
registration approach on this data. Second, the act of shrinking has resulted 
in a very poor fit to the original curves. Utilizing P{^Zi) has three advantages 
which allows us to address both these problems. First, since the moments are 
measures of shifts in the time axis, forcing the Zj's to have similar moments 
has no effect on their amplitude and, hence, does not cause the shrinkage 
problem observed in Figure 2(c). Second, the moments are only a summary 
of each curve so can often be much more accurately estimated than the entire 
curve. For example, the cross-sectional mean of the curves in Figure 2(a) is a 
poor estimate of the overall shape of the curves. However, ^™ ax and /i™ m still 
provide good estimates for the maximum and minimum of the original curve 
that the data was generated from, so the problem of aligning the curves to 
the wrong shape can be eliminated. Finally, one can choose among a wide 
range of feature functions when producing the moments. Hence, one can 
identify specific characteristics or features in the curves and design feature 
functions accordingly. Since feature functions can theoretically be designed 
to identify, and hence synchronize toward any consistent marker events, the 
landmark registration approach can be seen as a special case of the moments 
method. 

4.2. Asymptotic theory. Section 5 illustrates the moments method's prac- 
tical performance on the Berkeley growth curve data and Section 6 provides 
a comprehensive comparison to other methods on several simulated data 
sets. However, we can also show that, under general regularity conditions, 
the method exhibits good large sample properties in terms of asymptotic 
consistency of the estimators. Let rj represent the set of parameters for 
our model, that is, 7 1 ,...,7 Ar and 0%, . . . ,0n, and ff n the corresponding 
estimates from minimizing (7). Then we first introduce four assumptions: 

(A-l) [i^ 1 is a continuous function of 6i for all I and k. Also, z(Wi(tj)) is 
a continuous function of -y i . 

(A-2) z(W{t)) T Q is a uniformly continuous function of t, that is, for all 
5± > 0, there exists 82 > such that for all t\,t2, where |ii — t 2 \ < $2, 
it is the case that |z(iy(ti)) T - z(W(t 2 )) T 9\ < <5i for any 6 and 7. 

(A-3) We choose feature functions and corresponding moments such that 
the synchronization model given by (3) and (4) is identifiable when 
the curves are observed over a finite set of time points, t. 
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(A-4) The parameter space is bounded, that is, ||f7|| 2 < M for some finite 
M. 

We can not hope to have consistent estimators without (A-l) because that 
would imply that estimating fJ,^ and z(Wi(tj)) well did not necessarily 
correspond to estimating the true parameters well. (A-2) places a restriction 
on the lack of smoothness of the fit. Some level of smoothness must always 
be imposed on such fits or a line that interpolated the observed values of 
Y would minimize the criterion. (A-3) is obviously necessary because if the 
model is unidentifiable we could not select the correct parameters even if we 
had complete information. (A-4) assumes that the estimators are not allowed 
to diverge off to infinity. Subject to these four assumptions, we provide the 
following consistency result. 

Theorem 2. Let A sync . n ,Aw,n an d A mom ,n represent the tuning param- 
eters as a function ofn. Suppose that (A-l) through (A-4) hold, that A syncn 
and Aw,n are o(n) and that A mom ,n is 0{n). Then r) n will be a consistent 
estimator for rj , that is, f] n — > rj a.s. 

4.3. Selection of tuning parameters. A key component of our synchro- 
nization approach is the choice of the tuning parameters A sync , Aw and 
A mom . The choice of these parameters is governed by a tradeoff between 
quality of fit, that is, how well the estimated curves fit the observed data, 
the level of synchronization achieved and the amount of distortion to the 
shape in performing the synchronization. In general, improving performance 
in one of these characteristics will cause a deterioration in the other two. An 
analogy would be choosing between small probabilities of type 1 and type 2 
errors in hypothesis tests. Of course, the standard approach in that setting 
is to minimize the probability of a type 2 error subject to an upper bound 
constraint on the probability of a type 1 error. We take a similar approach 
here by selecting the tuning parameters to produce the best possible syn- 
chronization subject to constraints on the lack of fit and the distortion of 
the shape. 

We measure the level of synchronization, Sync, using the average squared 
deviation of the synchronized curves from their mean curve as a percentage 
of the same quantity for the unsynchronized curves. Hence, a value of zero 
would indicate an identical shape for all synchronized curves, while one 
corresponds to no improvement in the synchronization. The lack of fit, a, 
is quantified using the average standard deviation between the observed 
curves, Yi(tj), and their "estimates," Yi(tj). Finally, the distortion to the 
shape of the curves is measured using P(W). We then select the tuning 
parameters so as to minimize Sync subject to a and P(W) being less than 
certain upper bounds. Performing this optimization over three parameters is 




a potentially difficult computational task. Fortunately, the fit turns out to be 
fairly stable for wide ranges of possible values for Aw and A mom > while A sync 
has a considerably stronger influence. In the case of A mom it makes intuitive 
sense that its exact value is not important because the moments are acting to 
produce an identifiable result, so any reasonable weight will make the model 
identifiable and, hence, produce a good fit. Hence, it is feasible to implement 
a grid search over the three parameters where the grid for Aw and A mom is 
very coarse, while the grid for A sync needs to be considerably finer. For the 
growth curve data, illustrated in the following section, we use values of 10 3 , 
10 4 , 10 5 and 10 6 for A mom and values of HT 1 , 10° and 10 1 for A W - We have 
found these grids to work well for the problems we have examined. This is 
consistent with Ramsay and Li (1998) who also found that a small grid of 
tuning parameters worked over a wide range of applications. In theory cross- 
validation could be used to select the dimensions of the basis functions z and 
w. However, in practice we have found that, given the flexibility provided 
by the three tuning parameters, any dimension that provides a reasonably 
flexible basis will suffice. 

5. An application to the Berkeley growth curve data. In this section we 
demonstrate the moments based method on the Berkeley growth curve data, 
discussed in Section 1, utilizing the nonlinear warping functions, Wj, given 
by (6). The data were obtained by fitting a smoothing spline to the second 
differences of the observed heights for each of ten boys. The smoothing was 
performed to aid visualizing the resulting curves. We have also performed 
registration on the raw data with similar results. The first step in imple- 
menting our approach involves the choice of the feature functions. This data 
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exhibits clear global maximums and minimums so we elected to utilize I™ 8 * 
and I™ m with r = 100. For both feature functions we concentrated on the 
first moment, but one could also have used additional higher-order moments. 
Next we selected the tuning parameters using the approach from Section 4.3. 
Figure 3 provides an illustration of this method. Each plot contains 12 sep- 
arate lines corresponding to four different values for A mom (10 3 , 10 4 , 10 5 , 
10 6 ) and three different values for A w (10~\ 10°, 10 1 ). The 12 lines are 
almost indistinguishable from each other, emphasizing the insensitivity of 
the result to the exact choice of A mom and Aw- Figure 3(a) plots a as a 
function of A sync . Similarly, Figure 3(b) plots Sync as a function of A sync . 
Finally, Figure 3(c) plots Sync as a function of a. All three plots show a 
smooth tradeoff between a and Sync with little effect from the other two 
tuning parameters. We opted to use tuning parameters that produced the 
optimal synchronization subject to a being no larger than 0.1 and P(W) no 
greater than 0.5. These cutoffs were chosen because they seemed to produce 
a high level of synchronization with a relatively low increase in a. The dots 
on Figure 3 correspond to this fit (A sync = 0.2, Aw = 10, A mom = 10 5 ). We 
can see that attempting to further synchronize the curves past this point 
will result in a large increase in a. 

Figure 1(b) in Section 1 provides a plot of the synchronized curves, Zi, 
from the resulting fit. Notice that the synchronized mean curve not only 
appears to estimate the correct height for the peaks and troughs but also 
shifts the peak to a later age from that of the cross-sectional mean. To help 
judge the accuracy of our procedure, Figure 4 provides a comparison to 
other potential methods. Here we have plotted the estimated mean accel- 
eration curve using five different approaches. In particular, we applied our 
moments method using the above tuning parameters, the moments method 
with A mom = 0, landmark registration (aligning on the peak and the trough 
of each curve) , the continuous monotone registration method and the cross- 
sectional mean from the unaligned curves. The cross-sectional mean is well 
known to be inadequate for this data set [Gasser et al. (1984)]. However, 
the landmark method provides a natural gold standard for this problem be- 
cause it is known to work extremely well in situations such as this one where 
each curve exhibits a very similar structure [Kneip and Gasser (1992)]. All 
four methods give considerable improvements over the cross-sectional mean, 
but the moments method with A mom = 10 5 gives the most similar fit to the 
landmark approach. The continuous monotone registration method gives 
the worst performance of the four because it does not take advantage of the 
specific shape information in the data. Finally, the moments method with 
Amom = gives somewhat intermediate performance. While it does a good 
job correctly estimating the trough, it fails to identify the correct location of 
the peak. Again, this is because it fails to make full use of the structure that 
is present. This illustrates that, while the results are relatively insensitive 
to the choice of A mom i this term is still a vital part of the fit. 
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I I I I I I I 

6 8 10 12 H 16 1S 

Synchronized Age 

Fig. 4. Plots of mean curves on the Berkeley growth curve data using cross-sectional 
mean (dashed black), continuous monotone registration (dotted), moments method with 
Amom = (dashed grey) and A m0 m = W° (solid grey), and landmark registration (solid 
black). 



6. Simulation study. In this section we compare the performance of our 
moments based synchronization approach with the continuous monotone 
registration and landmark methods over four sets of simulations. For each 
simulation 100 data sets, each consisting of ten curves sampled at 100 equally 
spaced time points, were generated from a given distribution. Six different 
synchronization methods were then applied to each data set corresponding 
to the moments, continuous monotone registration and landmark procedures 
using both linear and standardized, (6), warping functions. For the moments 
method, we used K = 1 moment for each feature function. For each set of 
simulations, the A parameters were chosen by selecting the values that pro- 
vided maximum alignment on a preliminary data set subject to constraints 
on a and P(W) as discussed in Section 4.3. The simulation results are sum- 
marized in Table 1. Two numbers are provided for each simulation- method 
pair corresponding to Sync and a as defined in Section 4.3. For the moments 
method, a was produced using Zi(Wi(t)), while for the other two methods 
it was computed using a smoothing of the curves performed via a smoothing 
spline prior to synchronization. 

Simulation one consisted of curves generated from a standard Gaussian 
density which were then stretched and shifted in the X or time axis. Fig- 
ure 5(a) illustrates a typical set of curves. We used the peak of each curve 
as the marker event for the landmark methods and I max (t) (r = 100) for 
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Table 1 

Results from four simulations on six different alignment methods. Sync is measured as a 
percentage so 100 corresponds to no improvement in synchronization. The standard 
errors on the Sync were between 0.15 and 0.45 for those results marked with an * and 
were less than 0.15 for all others 



Simulation 



One Two Three Four 



W(t) 


Method 




Sync 


a 


Sync 


a 


Sync 


a 


Sync 






Cont. Mono. 


Reg. 


0.02 


0.0005 


76.50 


0.0003 


78.63* 


0.004 


75.37* 


0.009 


Linear 


Landmark 




11.86* 


0.0005 


1.15 


0.0003 


9.39 


0.004 


15.64 


0.009 




Moments 




0.07 


0.0013 


0.50 


0.0020 


8.33* 


0.028 


13.71* 


0.039 




Cont. Mono. 


Reg. 


0.06 


0.0005 


39.47 


0.0003 


21.55* 


0.004 


21.18 


0.009 


Nonlinear 


Landmark 




12.32* 


0.0005 


6.31 


0.0003 


2.86 


0.004 


1.42 


0.009 




Moments 




<0.01 


0.0013 


0.59 


0.0008 


0.76 


0.009 


1.20 


0.014 



the moments methods. For this simulation, the continuous monotone reg- 
istration and moments methods both worked very well. In particular, the 
continuous monotone registration method produced good results because 
the cross-sectional mean of the observed curves, used to produce the target 
function, still had an approximate bell shape. There was little difference be- 
tween the linear and nonlinear warping functions because the true warping 
was in fact linear. The landmark method, while still providing a consid- 
erable level of synchronization, performed relatively less well because, with 
only one marker, it could not adequately correct for differences in the spread 
of the curves. 

Simulation two had a similar set up to the previous simulation except that 
half the curves were centered close to 0.7, while the others were centered close 
to 0.3 [see Figure 5(b)]. As a result, the cross-sectional mean was bimodal, 
which significantly adversely affected the continuous monotone registration 
method. The landmark method performed relatively better on this data 
because shifts in the curve, which it could correct for, formed a larger portion 
of the lack of synchronization. The moments method was only marginally 
affected by the bimodal shape of the data. 

For simulation three we generated curves using the distribution illus- 
trated in Figure 2(a). These curves were produced using a nonlinear warping 
function and presented a more challenging problem. We utilized both the 
maximum and minimum points as markers for the landmark methods and 
J max (t) and I mm (t) (r = 100) for the moments methods. Again, the con- 
tinuous monotone registration method performed poorly because the cross- 
sectional mean did not adequately reflect the shape of the curves. The land- 
mark and moments methods both gave good results. For all three procedures 
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(a) (b) (c) 




0.0 0.2 0.4 D.« 0.8 1.0 0-0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0-8 1.0 

Time Tinw Tune 

Fig. 5. (a) A simulated set of ten curves that have been "warped" in time. This is one 
of 100 data sets from simulation one. (b) One of the 100 data sets from simulation two. 
(c) One of the 100 data sets from simulation four. For each plot, the thicker grey line 
indicates the original shape. 



the nonlinear warping functions worked considerably better than their linear 
counterparts. Finally, the fourth simulation tested out the effect of noise in 
the observed curves by adding Gaussian errors with standard deviation of 
0.01 to the data from simulation three [see Figure 5(c)]. We also added a 
linear drift in the curves to ensure that the moments method still performed 
well when the curves started and ended at differing values on the Y-axis. In 
general, these changes caused a moderate deterioration in the linear versions 
of the landmark and moments procedures, presumably because the drift in 
the curves made it harder for a linear warping function to accurately realign 
the curves. However, the nonlinear versions gave fairly similar performance 
to those of simulation three. Note that some improvement in the moments 
method results may have been possible if we had smoothed the curves be- 
fore applying our approach. However, given the small deterioration from 
simulation three, it is doubtful that any significant gains would have been 
achieved. 

These simulation results may be somewhat unfair to the landmark method 
because it is difficult to implement this approach in a truly automatic fash- 
ion. For example, by manually identifying additional landmarks in the sim- 
ulated curves, one may have been able to produce fits closer to that from 
the moments approach. However, our attempt here is not necessarily to 
show that our approach will outperform landmark registration where mul- 
tiple marker events can be manually identified, since landmark registration 
is considered the benchmark in this case. Rather, we want to show that the 
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moments method can give comparable results, without the need for manual 
intervention, when marker events are present, but can also provide accurate 
results even in the absence of such markers. 

Notice that because of the way that the moments method works its a 
was somewhat higher on all four simulations than for the continuous mono- 
tone registration or landmark methods. This is one of the tradeoffs for a 
higher level of synchronization. However, the increase is relatively small, par- 
ticularly for the nonlinear warping functions, so the tradeoff clearly seems 
worthwhile. Simulations two and three illustrate the advantage of combining 
landmark and continuous monotone registration criteria together. By first 
synchronizing based on landmarks, such as turning points, we can achieve a 
partial synchronization and then estimate /x well enough to produce a very 
accurate final alignment. In such situations we have found that the best re- 
sults are obtained by using a relatively higher value for A mom i n the first few 
iterations and then reducing A mom while increasing A sync in the remaining 
iterations. This is the approach we took for these simulations. 

7. Discussion. In this article we have developed a general moments based 
approach to the problem of synchronization of functional or curve data. The 
generally accepted benchmark for such problems is landmark registration 
which aligns curves by identifying marker events. This approach can be 
very effective but has two, potentially significant, disadvantages. First, it 
assumes all curves have consistent marker events and, second, even if the 
marker events exist, one often must manually identify them, which is not 
feasible for large data sets. Alternatively, the continuous monotone regis- 
tration method works well when an adequate target function, T(t), can be 
identified but fails when the data is poorly enough aligned that T(t) does not 
match the shape of the curves. The moments based approach builds on the 
strengths of both methods and reduces or eliminates their deficiencies. As 
with the landmark approach, for those curves with marker events, feature 
functions, such as J max (i) or I mm (t), can be implemented to synchronize 
based on these events. However, for curves, or data sets, that do not ex- 
hibit such markers, more global feature functions, such as I^ m \t), can be 
utilized. In this sense our method is an extension of landmark registration. 
When comparing to the continuous monotone registration approach, notice 
that Z(t) = z(t) T fig can be considered to be the analog of T(t) in that we, 
at least partially, synchronize the curves toward it. However, even in situa- 
tions where the cross-sectional mean provides a poor estimate for T(t) and, 
hence, the continuous monotone registration method fails, the moments will 
often induce an accurate enough initial synchronization that Z(t) will rep- 
resent the correct shape. Hence, as the method iterates through the fitting 
algorithm, the synchronization becomes better as opposed to the continuous 
monotone registration fit where no improvement may be possible. The data 
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from simulations two and three provide a good illustration of this effect. 
Hence, our approach can also be considered as an extension of continuous 
monotone registration. 

This method could be generalized in several directions. Although, in this 
article, we have only discussed one-dimensional curves, the moments ap- 
proach could potentially be extended to multidimensional data. The defi- 
nition of the feature function, I g (t), could easily be expanded to such data 
and hence the moments also. Equating the lower-order moments could then 
be achieved in a similar fashion to the one-dimensional case. The most sig- 
nificant challenge would seem to be dealing with higher-order moments on 
high-dimensional data where the number of cross product terms could be- 
come unmanageable. Another possible extension is to attempt to model the 
covariance of the 0j's, Var(0j) = 0. For example, P(6i) could be altered to 
include G using, P*{6i) = (0j — /j, e ) T Q~ l (6i — fi g ). There are several possi- 
ble ways to model O. The first, which we have effectively used in P(6i), is to 
take equal to a multiple of the identity matrix. One could also estimate 
at each iteration via the sample covariance, = jfY^i(@i ~ — Me) T - 

However, such an unconstrained estimate may be impractical if the dimen- 
sion of the Oi's is large. One solution would be to constrain the rank of 
and, hence, significantly reduce the number of parameters to estimate 
[James, Hastie and Sugar (2000)]. Another appealing alternative would be 
to design such that P*(0i) placed no penalty on values of Oi corresponding 
to constant vertical shifts of z(t) T 6i. This would mean that two curves that 
differed only by a constant vertical shift would be considered to be perfectly 
synchronized and would likely significantly reduce the undesirable shrinkage 
toward the mean that, for example, is evident in Figure 2(c). 

APPENDIX 

A.l. Proof of Theorem 1. First note that (2) implies that 



Ig((t-a)/b) I (t-a\ 
((—)/*> W- Ji g ((t- a )/b)dt~ b 9 \ b )■ 



1, ft-a 



dt 



Hence, 

t J -h((s-a)/b)= J *4(( S -a)/6)(*)d* = y t^I h 

= I (sb + a)Ih(s) ds = b J Ih(s)ds + a J Ih(s) ds = bp$ + a, 
where s = . Similarly, 

V { hls-a)/b) = / (* - b ^h ~ a) k h((s-a)/b)(t) dt 
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\{t-b^-afl h {^)dt 
{sb-b^ ) ) k I h {s)ds = b k $ ) . 

A.2. Proof of Corollary 1. First note that if lf(t) oc </>(g(t)), then 
i-^W^^T^^tT)' Next note that 
(8) d m g((t-a)/b) 1 (m] ft 

^ ' -J+m Urn, " 



so 4«La)/6)( t ) a l9 (m) (^)l oc 4 m) (^)- To show the result for /™ x note 
that 

t — a\ [ft — a 



9 



6 

and similarly for /™ in . Finally, by (8), 

dg((t - a)/b)/dt gW((t - a)/b)/b g^((t - a)/b) 



SO 



y/<?9((t ~ a)/b)/dt 2 ^gW((t-a)/b)/b 2 ^Jg( 2 )((t - a)/b) 
.local ( + \^ ov J x dg((t-a)/b)/dt 

J ^"'((' --)/") 

V yV 2| ((<-<W V 6 

A. 3. Proof of Theorem 2. First we state and prove a lemma. 

Lemma A.l. Suppose 

(9) sup|z(iy n (t)) T n -z(Wo(O) T 0oHO .a. 

and 

(10) t*l — ► /i^ a.s. for I = 1, . . . , L and k = I, . . . , Ki, 

where Z n (t) = z(t) T 6 n and W n and Wq respectively represent the warping 
functions evaluated at 7 n and -y . Then, provided (A-l), (A-3) and (A-4) 
hold, 9 n — > #o a - s - anc ^ Tn — * 7o a - s - 
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A. 3.1. Proof of Lemma A.l. Note we treat each curve individually, so 
we drop the subscript i and let r) = Q£) and i) n = (g™)- To reduce notation, 
let 

f(r),t) = z(W(t)) T e. 

First, note that (9) and (10) imply that there exists O* with P(0*) = 1 s.t. 
Vcj* GO*, 

(11) f(f]n(u*),t)^f( Vo ,t) Vt 
and 

(12) //i ,fc) (o;*)->/4' fc) for 1 = 1,. ..,L and k = l,...,K l . 

Now, suppose that ?) n does not converge a.s. to r) . This implies there exists 
$7 with -P(O) > s.t. Vw € O, f] n (co) does not converge to ?7 . Since the 
intersection of 0* and must be nonempty, we take a particular uj 6 O* H O. 
Then there exists an infinite subsequence n'(u;) and 5(w) > such that 

(13) \\Vn>(w){u)-Vo\\>Ku) 

for all n'{io). But recall that any bounded sequence must have a convergent 
subsequence. Hence, by boundedness of 7 and 6, (A-4), there must be a 
subsequence, n"(uj), of n'(io), and a t)*(oj), such that 

(1 4 ) *>*"(w)H-»»7*( w )- 

Let W* represent the warping function evaluated at 7*. Then, since z(W n ) 
is a continuous function of 7 n and fjS~ is a continuous function of # n [by 
(A-l)], / is continuous and, hence, (14) implies that 

(15) f(r, n „ H (u\t)^f(f,*(u),t) Vt 
and 

(16) fA' k) (w) -» for Z = 1, . . . , L and k = 1, . . . , K h 

Now, (11) and (15) imply that f(rj ,t) = f(f]*(u;),t) for all t, while (12) and 
(16) imply that n^ k ' = fi^'* for Z = 1, . . . , L and k = 1, . . . , Ki. By moments 
identifiability of the model, (A-3), this implies »7*(w) =»7o- by (13) and 
(14), \\r]*(uj) - rj Q \\ > 0, which is a contradiction. Hence, i) n — ► 770 a.s. 
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A. 3. 2. Proof of the theorem. Let r] represent the set of parameters for 
our model, that is, y 1 , . . . , 7^, 6\, . . . , 0jv, and 6^. Each curve is evaluated 
at n time points, ti,...,t n . Let 



-, N n 



n . 
1=13=1 



^ N L Ki 

b(rj) = Amom- E E E ^£ ~ ^4'*' 
j=l {=1 fc=l 



c(»?) = Eili - Mell 2 and d(t|) = EtiP(Wi), where Wy = W^fo). So 
<2n(r?) = o„(»/) + b(rj) + ^^0(77) + ^d(rj) represents (7) using n time 
points. Let rj represent the true parameters and fj n the estimators result- 
ing from minimizing Q n . Then Q n (rj ) = a n (r] ) + Asy ^ c -" c(r) ) + ^f-d(rj ), 
where c(rj ) and d(rj ) are both finite. Note c(rj ) is finite since is bounded 
and d(r] ) is finite because, by (5), < Wq. (t) < 00 for t £ [0, T], provided 
/o; (t) is bounded and this is the case because 7 . is bounded. Also, b(rj Q ) = 

because fi^'o = = ' ' ' = f^z'o = f° r an ^ an< ^ ^ where Zo t = 

z T Oo t . Clearly, Qn(Vn) ^ Qn(Vo) because r) n is optimized over all r\. Also, 
QniVn) ^ a n(^) n ) + &(*)n) because c and d are positive. Hence, 

(17) a n (fi n ) + fe(r) n ) < a n (rj ) + Asy ^ c,n c(T? ) + ^^d(rj ). 

Let ny = (z(Wo y ) r 0i -*{W niJ ) T e rH ) and e y = (1-,- - z(W 0ij ) T 0i ) , where 
Woj - = W(tj) evaluated using the true 7$ and Wny = using 73 from 

r? n . Then 

, JV n 
i=Xj=l 



N n 

1 E E(*tf - z (^ + z (^, - z (^) T ^ 



2 = 1 J=l 



1 V n 

( 18 ) = -EE(% + <M 2 



N n ^ N n ^ N n 

- E E 4 + - E E ^ + 2-EE ^ 

i=lj'=l i=lj=l i=l j=l 

, AT n. ^ TV n 

a n(Vo) + - E E <^ + 2- E E e «&»« • 
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Therefore, by (17) and (18), 

(!9) ^ E E < + 4 E E ^ + 6«J < ^T^o) + 

1=1 J=l 1=1 J=l 

But notice that the e^'s are i.i.d. mean zero random variables. Also, cj) ni . is a 
difference of two bounded uniformly continuous functions, so is also bounded 
and uniformly continuous. [Note z(W) T 9 is uniformly continuous by (A-2) 
and is bounded because it is a continuous function of bounded parameters, 
7 and 6, by (A-l) and (A-4)]. Hence, by a standard application of the SLLN, 
n Y^j=i £ ij ( t ) n i j — ► a.s. as n — > co. [See Theorem 1.13(h) in Shao (2003) for a 
proof of this result.] In addition, A synCjn and A\y,n are o(n), so the right-hand 
side of (19) also converges to 0. Therefore, it must be the case that 

, N n 

(20) -EE4^° a.s. foralH 

n i=ij=i 

and 

(21) b n (f) n )^0 a.s. 

Since A mom is 0(n), (21) implies that (10) in Lemma A.l must hold for each 
curve i. Finally, to show that (9) holds, we divide the time interval [0,T] 
into H equal sized regions R±, . . . ,Rh- Let n/j = n/H equal the number of 
time points in region h. Then, by (20), it must be the case that, for every 
<jj > 0, for large enough n, 

(22) — V \(pij\<uj a.s. 

But by uniform continuity, (A-2), there must be a 5h > such that 

(23) |(z(VMt)) T Oi - *{W ni (t)) T K) ~ <f>ij\ < $H 
for any t and tj in R^. Combining (22) and (23), we see that 

(24) MW 0i (t)f0 0i -z{W fH (t)) T K)\ <Sh + oj 

for any t € Rh and large enough n. But by making n large enough, this will 
apply simultaneously for all regions, so (24) will hold for all t. Now send 
n — > oo, H — > oo and n/H — > oo. Then — > co so cu can be made arbitrarily 
small, but also H — > oo so 5h can also be made arbitrarily small. Hence, (9) 
holds for each curve i. Therefore, the two conditions for Lemma A.l (9 and 
10) have been proved and, therefore, by Lemma A.l, the theorem has been 
proved. 
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